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ABSTRACT 


One  technique  for  displaying  a  set  of  quantitative  vari¬ 
ables  is  to  represent  the  set  as  a  polygon.  Such  displays 
allow  the  observer  to  visualize  complex  information  quickly, 
as  a  whole.  Polygon  displays  have  been  employed  to  display 
information  for  analysis,  status,  or  presentation.  An  ex¬ 
perimental  investigation  was  undertaken  to  ascertain  the  ef¬ 
fect  of  variation  in  certain  visual  features  of  the  display 
on  the  consistency  with  which  people  categorize  information 
presented  as  polygons.  Variables  included  background  infor¬ 
mation  of  the  display,  shading,  and  form.  Subjects  performed 
a  categorization  task  on  two  sets  of  data;  the  results  are 
analyzed  for  consistency  between  individuals  and  for  con¬ 
sistency  with  certain  standard  clustering  algorithms.  The 
effects  of  distinctive  portions  of  the  figures  on  the  judg¬ 
ment  of  similarity,  and  of  the  nature  of  the  data  and  of 
interactions  of  combinations  of  the  variables  used  in  the 
experiment  on  the  consistency  of  clustering  were  noted.  Im¬ 
plications  for  the  design  of  polygon  displays  are  discussed. 
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CHAPTER  ONE  --  GRAPHICS  AND  PERFORMANCE 
Introduction 

The  representation  of  quantity  seems  to  have  developed 
at  the  same  time  as  written  language  in  human  history,  but 
only  recently  have  graphic  forms  for  representing  quantita¬ 
tive  ideas  been  developed.  The  concept  of  drawing  upon  the 
human  visual  system's  capacity  for  perceiving  and  comparing 
patterns,  thus  allowing  the  integration  of  large  numbers  of 
individual  information  items,  seems  to  have  flowered  in  the 
late  eighteenth  century,  particularly  in  the  work  of  William 
Playfair  (see  examples  in  Tufte,  1983).  While  interest  in 
graphical  presentation  has  varied  over  the  years  since,  in 
many  respects  the  field  has  not  progressed  beyond  these  early 
works.  The  last  two  decades  have  seen  a  renewed  enthusiasm 
for  graphical  methods  of  data  presentation,  drawing  in  part 
on  an  increasing  emphasis  on  visual  media  and  on  the  growing 
capabilities  of  computing  and  related  machinery.  One  of  the 
areas  of  new  interest  is  graphic  representation  in 
multivariate  statistics,  a  field  which  itself  owes  much  re¬ 
cent  development  to  applications  of  the  computer. 

Work  Relating  to  Graphics  and  Human  Performance 

Work  on  the  graphic  representation  of  quantitative  in¬ 
formation  is  found  in  the  literature  of  a  number  of  different 
disciplines,  each  of  which  propounds  its  own  point  of  view. 
In  reviewing  this  work  one  must  be  prepared  to  range  over  a 
broad  spectrum  of  fields,  from  statistics,  graphical  arts  and 


cartography  to  education  and  ergonomics.  Within  this  vari¬ 


ety)  though,  and,  indeed,  maybe  because  of  it,  there  has  de¬ 
veloped  no  accepted  theoretical  basis  fr'r ■  visual  graphics, 
nor  even  a  consistent  body  of  terminology.  At  the  most  basic 
level,  for  example,  there  is  the  inconsistency  in  the  use  of 
the  terms  "graph"  and  "chart,"  compounded  by  the  word 
"graphic";  these  terms  are  found  interchangeably. 
MacDonald-Ross  (1977)  gives  a  brief  glossary  and  discussion 
of  certain  inconsistencies,  such  as  those  associated  with 
"nomograph";  various  handbooks  also  provide  sets  of  terms. 
There  is  no  generally  accepted  classification  of  graphic 
techniques.  Furthermore,  and  what  is  of  interest  here,  there 
has  only  recently  evolved  the  concept  of  investigating  the 
characteristics  and  forms  of  various  graphical  presentations 
by  observing  or  measuring  the  performance  of  the  user  (see 
Kruskal  ,  1975  )  . 

Tufte's  recent  book  (1983)  provides  a  good  introduction 
and  background  to  quantitative  graphics,  as  well  as  examples 
of  some  of  the  best  of  the  "art."  Beniger  and  Robyn  (1976) 
provide  a  brief  historical  overview;  Feinberg  (1979)  has  re¬ 
viewed  developments  in  statistical  graphics  and  noted  the 
paradoxical  trend  of  recent  renewed  interest  in  graphics  but 
their  generally  decreasing  use  in  technical  publications. 
Cleveland  (1984b)  has  also  surveyed  usage  in  scientific  pe¬ 
riodicals;  Wainer  and  Thissen  (1981)  also  provide  an  overview 
of  recent  developments  with  good  examples.  The  last  two 
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decades  have  been  a  period  of  innovation,  particularly  in 
exploratory  methods  of  data  analysis,  in  probability  plotting 
procedures,  and  in  multivariate  techniques.  Izenman  (1980) 
has  comprehensively  reviewed  the  contributions  during  this 
period.  To  aid  those  who  employ  visual  graphics,  a  number 
of  handbooks  or  manuals  of  techniques  have  appeared,  based 
on  intuition  and  aesthetics;  those  by  Schmid  and  Schmid 
(1979),  Schmid  (1983),  and  Chambers,  Cleveland,  Kleiner,  and 
Tukey  (1983)  are  recommended.  Furthermore,  there  have  been 
some  efforts  expended  at  various  times  toward  the  establish¬ 
ment  of  standards  for  graphic  presentation  (Schmid,  1976). 
These  efforts  often  appear  to  have  had  little  impact,  though; 
examples  of  poor  graphics  continue  to  appear  regularly, 
ranging  from  those  that  are  merely  confusing  to  some  that  are 
deceptive  (Wainer,  1980;  1984). 

Although  new  methods  and  forms  of  the  graphical  presen¬ 
tation  of  quantitative  data  have  been  developed  since  the 
time  of  Playfair,  there  has  been  relatively  little  empirical 
evidence  established  for  the  pre'erence  of  particular  methods 
in  a  given  circumstance,  or  for  the  use  of  particular  fea¬ 
tures  of  a  graphic  type  to  best  serve  the  function  intended. 
Several  articles  have  at  least  partially  reviewed  the  work 
that  has  been  done  (Feinberg,  1979;  Kruskal,  1982; 

MacDona ld-Ross ,  1977;  Wright,  1977).  It  is  interesting  to 
note  that  Charles  Babbage,  the  forefather  of  computing  ma¬ 
chinery,  was  one  of  the  first  to  express  concern  for  the 
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presentation  of  data  and  its  effect  on  the  observer  (see 
Kruskal,  1982).  During  the  1920's  and  1930’s  some  studies 
were  undertaken,  primarily  by  statisticians,  to  contrast  the 
relative  merits  of  the  bar  and  circle  (or  ,  pie)  graphs 
(Croxton,  1927;  Croxton  and  Stein,  1932;  Crcxton  and  Stryker, 
1927;  Eells,  1926;  Huhn,  1927;  Graham,  1937).  There  are 
problems  with  generalizing  from  these  early  experiments, 
though,  due  to  the  limited  variety  of  graphical  represen¬ 
tation  and  some  me t hod o 1 o g i c t 1  considerations;  further,  their 
results  are  at  times  inconsistent. 

This  "Bar-Circle  Debate,"  as  Kruskal  ^1982)  has  termed 
it,  is  continuing  to  the  present.  Peterson  and  Shramm  (1954) 
found  circles  to  be  more  accurately  used;  Culbertson  and 
Powers  (1959)  found  multiple  bars  better.  Cleveland  and 
McGill  (1984),  in  a  sound  article  attempting  to  generate  a 
theoretical  basis  for  graphic  perception,  reported  that  their 
subjects  could  estimate  proportion  significantly  more  accu- 
ra.-iiy  from  multiple  or  grouped  bars  than  from  circle  graphs. 
This  result,  in  part,  sustains  their  hypothesis  that  judg¬ 
ments  of  position  along  a  scale  are  better  than  those  of  an¬ 
gle  or  area.  Further,  they  proposed  using  dot  charts  in 
place  of  both  (Cleveland  and  McGill,  1984;  Cleveland,  1984a). 
This  long-running  controversy  has  spilled  over  into  the 
cartographic  literature,  as  will  be  mentioned  later. 

This  debate  is  indicative  of  the  problems  entailed  in 
relating  human  performance  and  graphics.  Bar  and  pie  charts 
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continue  to  be  the  most  frequently  used  graphs  in  most 
fields;  many  software  packages  will  turn  them  out  with  ease. 
And  yet,  there  is  still  not  a  solid  basis  for  using  or  con¬ 
demning  (completely)  one  form  or  the  other,  or  for  which  are 
the  features  that  wi’l  make  either  best  convey  what  it  is 
supposed  to  convey.  Cleveland  and  McGill's  work  is  solid  and 
it  is  hoped  that  more  such  will  be  undertaken  to  provide  a 
firmer  basis  for  guidance  to  the  designers  of  graphs.  But, 
we  should  not  overlook  the  complexity  of  even  these  simple 
graph  forms  . 

Kruskal  (1982)  has  a  good  discussion  of  the  problems  of 
criteria  for  judging  graphs.  In  their  article,  for  example, 
Cleveland  and  McGill  (1984)  demonstrate  the  advantage  of 
showing  the  difference  between  five  approximately  equal 
portions  of  a  whole,  by  using  dot  rather  than  circle  charts, 
but  we  should  ask  whether  it  is  the  differences  we  actually 
want  to  convey,  or  is  the  approximate  similarity  more  impor¬ 
tant.  If  the  latter,  the  circle  would  seem  to  work  at  least 
as  well.  Secondly,  the  impact  of  portraying  the  division  of 
a  whole  of  something  is  more  forceful  in  the  circle  graph. 

As  they  point  out,  these  forms  are  usually  used  for  data 
presentation  rather  than  exploration;  in  such  use  many  fac¬ 
tors  must  be  considered.  The  study  of  graphical  perception 
and  use  is  only  beginning. 

Following  the  Second  World  War  better  experimental 
techniques  were  applied  to  the  investigation  of  the  relation 


of  graphic  characteristics  and  performance  rut  functional 
values  presented  as  graphs  and  as  tables  (see  Carter,  1947a; 
1947b).  This  work  was  followed  more  than  a  decade  later  by 
investigations  of  trend  representations  in  graphs  by  Schutz 
(1961a;  1961b). 

The  recent  period  of  interest  in  graphics  has  seen  a 
variety  of  new  techniques  and  formats  being  proposed,  par¬ 
ticularly  in  the  field  of  statistics.  As  computer  technology 
has  advanced,  new  graphical  forms  have  been  developed  to  take 
advantage  of  the  computer's  capacity  for  data  manipulation. 
Research  concerning  the  relation  of  graphic  techniques  and 
performance  has  also  gained  interest.  Wainer  has  proposed 
the  development  of  a  body  of  empirical  results  which  could 
aid  graphic  designers  in  choosing  appropriate  parameters  for 
specific  purposes.  He  conducted  several  experiments  with 
this  intent,  investigating  the  use  of  rcotograms,  a  graphical 
representation  of  the  residuals  of  the  root  of  nonlinear  fit, 
and  a  comparison  of  graph  types  (1974;  Wainer  and  Reiser, 
1976).  Examples  of  a  similar  nature  include  investigations 
of  correlation  estimation  parameters  (Cleveland,  Harris,  and 
McGill,  1983),  the  use  of  bar  graph  displays  for  process 
control  (Verhagen,  1981)  and  further  investigations  of  the 
relationship  between  tabular  and  graphic  display  (Feliciano, 
Powers,  and  Kearl,  1963;  Ghani  and  Lusk,  1982;  Powers, 
Lashley,  Sanchez,  and  Schneiderman,  1984;  Remus,  1984). 
Considering  the  widespread  use  of  graphics  for  the  represen- 
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tation  of  data  and  the  availability  of  computer  programs  to 
produce  displays  of  data ,  though,  there  has  been  relatively 
little  work  to  indicate  which  types  or  characteristics  of  a 
given  type  of  display  are  best  for  fulfilling  a  particular 
purpose . 

The  use  of  graphic  representation  has  also  been  studied 
in  the  field  of  education.  There  has  been  some  indication 
that  graphics  are  not  intrinsically  more  interpretable  than 
textual  material  and  that  their  redundant  use  may  be  detri¬ 
mental  (Feliciano  et  al.,  1963;  Preece,  1983;  Roller,  1980; 
Vernon,  1946,  1950).  Other  work  has  investigated  certain 

further  aspects  of  the  use  of  graphics  in  education  (see 
Eggen,  Kauchak,  and  Kirk,  1978;  Kirk,  Eggen,  and  Kauchak, 
1978;  Washburne,  1927).  Cartographers  have  also  shown  con¬ 
cern  for  the  relationship  between  the  characteristics  of 
graphics  and  their  capacity  to  communicate  information 
quickly  and  accurately  (see  for  review,  Phillips,  1979; 
Potash,  1977).  Thematic  maps  are  a  special  class  of  quanti¬ 
tative  data  graphics.  The  use  of  symbols,  particularly 
graduated  circles  and  circle  graphs,  has  been  extensively 
investigated  (see  Chang,  1977;  Cleveland,  Harris,  and  McGill, 
1982;  Cox,  1976;  Flannery,  1971;  Meihoefer,  1973),  as  well 
as  various  aspects  of  the  use  of  color  for  representing  area 
or  quantitative  values  (Cleveland  and  McGill,  1983;  Dobson, 
1980;  Wainer  and  Francolini,  1978).  Considering  the  wide¬ 
spread  use  of  data  graphics  in  texts  and  maps  more  study  of 
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their  role  in  the  communication  of  information  is  surely 
needed  . 

Among  the  innovative  graphic  techniques  which  have  been 
proposed  in  recent  years,  those  which  deal  with  multivariate 
data  have  generated  some  interest.  These  techniques  derive 
from  the  general  increase  in  work  on  multivariate  analysis 
which  has  accompanied  the  growth  in  computer  applications  in 
statistics.  I  einberg  (.1979  )  and  Wainer  CWainer  and  Thissen, 
1981)  devote  sections  of  their  reviews  to  these  developments, 
along  with  providing  illustrative  examples.  Everitt  (1978) 
has  a  volume  on  the  graphical  presentation  of  multivariate 
data;  Chambers,  Cleveland,  Kleiner,  and  Tukey  (1983)  include 
a  chapter  on  multivariate  methods  in  their  recent  work  on 
graphics  for  data  analysis. 

One  rroup  of  techniques  for  the  representation  of 
multivariate  quantitative  data,  termed  point  representation, 
uses  a  particular  symbol  or  icon  to  represent  each  point. 

The  different  techniques  vary  from  each  other  primarily  in 
their  use  of  different  symbols,  ranging  from  profiles  or  bar 
charts  to  characterized  representations  of  the  human  face 
(see  Figure  1  for  examples,  from  Kleiner  and  Hartigan,  1981). 
The  basic  procedure  is  to  represent  each  object  of  the  set 
of  objects  to  be  compared  as  an  individual  graphical  unit,  a 
symbol  whose  appearance  is  determined  by  the  values  of  the 
variables  measured  for  the  object.  The  objects  of  the  set 
can  then  be  compared  by  observing  the  set  of  symbols  thus 
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generated.  Each  symbol  provides  a  single  pattern  of  that 
data  point.  Such  techniques  are  usually  employed  for  pre¬ 
liminary  cluster  identification,  detection  of  unusual  data 
points,  and,  to  a  lesser  degree,  trend  identification.  Among 
the  advantages  often  cited  for  iconic  displays  is  their  ca¬ 
pacity  for  each  to  be  perceived  in  a  holistic  fashion,  al¬ 
lowing  comparison  of  points  by  the  comparison  of  the  single 
image  of  each.  They  allow  the  user  to  observe  the  structure 
of  the  data,  by  eye,  directly  from  the  data  values  and  with¬ 
out  the  scaling  and  correlation  metrics  necessary  for  joint 
representations.  Further,  most  of  these  techniques  are  de¬ 
signed  to  make  use  of  computer  processing  and  graphic  capa¬ 
bilities,  allowing  the  creation  of  automated  systems  for 
producing  graphics  for  initial  or  exploratory  analysis  of 
data  . 

These  techniques  have  been  employed  for  data  analysis 
in  a  number  of  fields,  at  least  experimentally.  Friedman, 
Farrell,  Goldwyn,  Miller  and  Siegel  (1972)  reported  on  the 
use  of  polygons  to  classify  pathophysiological  stages  in 
septic  shock;  Jacob  (1978)  reported  on  the  use  of  cartoon 
faces,  proposed  by  Chernoff  (1972),  for  classification  of 
personality  profiles.  Examples  of  applications  in  other 
areas  include  those  reported  by  Bruckner  (1978)  at  Los  Alamos 
Laboratories  and  by  McDonald  and  Ayers  (1978)  at  General  Mo¬ 
tors.  An  interesting  use  of  faces  in  presenting  basic  sta¬ 
tistical  concepts  to  beginning  students  has  been  proposed  by 
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Pickover  (1984).  Bertin ' s. recent  work  (1981)  contains  a 
number  of  examples  of  the  graphical  analysis  of  multivariate 
data  sets  in  a  variety  of  fields.  In  his  introductory  chap¬ 
ter  he  outlines  the  use  of  his  matrix  procedure  to  analyze 
occupancy  data  for  a  resort  hotel  and  shows  how  such  analysis 
can  be  used  for  management  decisions  regarding  establishment 
of  rates.  The  technique  first  establishes  a  data  matrix* 
then  converts  each  row  to  a  profile  representation  (or  other 
visual  variable),  and  clusters  the  data  by  permutation  of  the 
rows.  The  display  can  be  used  as  a  method  of  exploratory 
analysis,  from  which  further  data  analysis  can  proceed,  as 
in  this  case,  by  examining  the  characteristics  of  the  clus- 
' *rs.  The  display,  with  some  modification,  can  also  be  used 
to  communicate  the  results  to  others.  Thus,  this  type  of 
display  can  fulfill  two  of  the  three  purposes  which  are 
commonly  cited  for  the  graphic  presentation  of  data 
analysis,  communication,  and  computation  (e.g.,  see  Chernoff, 
1978;  Tufte,  1983). 

The  relationship  between  these  point  or  iconic  repre¬ 
sentation  techniques  and  performance  has  been  explored  in 
several  studies.  Jacob  (1978;  Jacob,  Egeth,  and  Bevan,  1976) 
has  reported  on  a  series  of  experiments  concerned  primarily 
with  the  use  of  Chernoff  faces.  In  a  comparison  of  faces  with 
polygons  and  digits  for  clustering  data  with  nine  variables, 
by  pattern  recognition,  faces  were  found  to  be  significantly 
more  accurate;  polygons  were  as  fast,  but  were  less  accurate. 


In  a  comparison  of  forms  in  a  paired-associate  learning  task, 
faces  tended  to  be  better  than  the  other  forms.  Mezzich  and 
Worthington  (1978)  investigated  pattern  recognition  of  data 
by  comparing  seven  forms  of  representation,  including  linear 
profiles,  circular  profiles  (polygons),  faces,  linear  and 
polar  Fourier  series,  factor  analysis  in  two  dimensions,  and 
an  ordinal  multidimensional  scaling.  The  data  consisted  of 
the  values  of  17  variables  for  four  archetypical  psycholog¬ 
ical  patients  as  assigned  by  11  psychiatrists .  The  factor 
analysis  and  multidimensional  scaling  representations  were 
found  to  provide  the  best  performance,  with  the  polar  Fourier 
icons  next.  Chernoff  and  Rizvi  (1975)  investigated  the  ef¬ 
fects  of  the  relationship  of  the  features  in  using  faces  for 
data  representation  by  randomly  changing  the  features  as¬ 
signed  to  the  variables.  Such  changes  were  found  to  affect 
the  results  of  a  classification  task  (dichotomous  clustering) 
by  about  25  per  cent.  A  similar  problem  of  the 
interrelatedness  of  variable  assignment  and  the  perception 
of  form  is  cited  for  other  forms  of  iconic  representation 
(see  Bruckner,  1978;  Egeth,  Jacob,  Wainer,  Kleiner,  and 
Hartigan,  1981;  Kleiner  and  Hartigan,  1981;  Naveh-Ben j amin 
and  Pachella,  1982). 

Wilkinson  (1982)  compared  the  performance  for  four  icon 
types,  blobs  (polar  Fourier  series),  castles,  faces,  and 
polygons,  at  two  levels  of  dimensionality,  by  having  subjects 
judge  dissimilarity  in  pairwise  comparisons.  There  were 
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found  to  be  significant  differences  in  reliability  and  va¬ 
lidity  among  display  types,  with  faces  proving  better  in 
both.  There  was  also  a  significant  dimensionality  effect, 
but  the  interaction  between  dimensionality  and  type  was  not 
significant.  Freni-Titulaer  and  Louv  (1984)  compared  cas¬ 
tles,  trees  (Kleiner  and  Hartigan,  1981),  bar  profiles,  and 
bar  profiles  with  the  variables  ordered  according  to  hierar¬ 
chical  clustering.  Subjects  sorted  the  stimuli  into  two 
clusters;  time  and  accuracy  were  measured.  Trees  were  clus¬ 
tered  more  quickly  and  more  accurately  than  the  other  forms. 
Some  effects  were  also  found  from  differences  in  the  data 
sets  used  to  generate  the  graphics. 

These  studies,  for  the  most  part,  have  been  comparative 
in  nature;  there  has  been  little  effort  to  identify  which  of 
the  point  representations  functions  best  in  particular  cir¬ 
cumstances,  with  particular  types  of  data,  or  for  particular 
tasks.  Some  forms  have  not  been  used  in  the  studies; 
Bertin's  thorough  and  thoughtful  work  (1981;  1983),  for  ex¬ 
ample,  has  been  mentioned  only  in  passing  in  most  studies. 
Moreover,  there  has  been  little  investigation  of  the  re¬ 
lationship  between  the  graphic  characteristics  of  a  partic¬ 
ular  type  of  representation  and  the  performance  with  that 
graphic  . 
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CHAPTER  TWO  --  THE  PRESENT  STUDY 
Background  for  the  Display  and  the  Task 

This  present  study  was  designed  to  investigate  in  more 
depth  the  effect  of  variation  in  certain  characteristics  of 
the  iconic  representation  called  Polygons  (or  variously, 
Stars  or  Snowflakes)  on  the  ability  of  individuals  to  perform 
a  clustering  task.  Polygons  are  formed  by  representing  the 
value  of  the  variables  measured  for  each  object  as  a  point 
along  radii  of  a  circle  and  connecting  these  points.  Bas¬ 
ically  each  polygon  is  a  profile  representation,  or  line 
graph,  in  polar  co-ordinates. 

The  use  of  polygons  for  representing  multivariate  data 
has  been  reported  in  various  fields.  ar.J  several  of  the  com¬ 
parative  studies  of  multivariate  techniques  mentioned  earlier 
have  included  them.  Polygons  have  been  used  for  data  explo¬ 
ration  and  presentation  in  some  studies;  Hanson,  Kraut,  and 
Farber  (1984),  for  example,  used  this  technique  in  a  study 
of  use  patterns  for  UNiX  commands;  Zelenka,  Cherry,  Nir,  and 
Siegal  (1984)  used  polygon  displays  to  present  data  on  the 
variation  in  the  growth  of  quail.  A  single  polygon  has  been 
employed  as  a  graphical  data  display  in  investigations  of  the 
role  of  integration  of  information  (Carswell  and  Wickens, 
1984  ;  Goldsmith  and  Schvane veldt  ,  1982  ).  Goldsmith  and 

Schvaneveldt  noted  that  performance  was  better  for  polygons 
than  Chernoff  faces  in  pilot  studies.  This  type  of  display 


has  been  employed  in  a  proposed  Safety  Parameter  Display 


System  for  nuclear  power  plants  (Petersen,  Banks,  and 
Gertman,  1982;  Woods,  Wise,  and  Hanes,  1981,  1982).  Polygon 
displays  are  the  basis  of  the  decision  polar  graph,  a  visual 
representation  of  data  on  multiple  criteria  intended  as  an 
aid  to  the  management  decision  process;  Frazelle  (1985)  shows 
an  example  for  alternative  material  handling  systems. 

While  polygons  have  not  fared  as  well  as  some  of  the 
other  point  representation  techniques  in  some  comparative 
studies,  they  have  proved  better  in  others.  They  are  in  many 
respects  more  straightforward  and  simpler  to  understand  than 
some  of  the  other  icons,  such  as  blobs  (Fourier  series  in 
polar  co-ordinates)  or  castles.  Faces  are  burdened  with 
problems  of  subjective  interpretation  and  correlation  of 
variables.  Everitt  (1978)  has  a  good  example  of  the  depend¬ 
ence  of  this  type  of  display  on  the  relation  between  the 
variables  of  the  data  and  the  facial  features  used  to  repre¬ 
sent  them.  Cleveland  and  McGill  (1984)  point  out  the  diffi¬ 
culty  of  interpreting  faces  due  to  the  complexity  of  the 
perceptual  judgments  which  have  to  be  made.  The  observer 
must  compare  such  diverse  features  as  length  of  nose,  slant 
of  eye,  shape  of  face,  and  curvature  of  mouth,  for  example, 
to  extract  a  sense  of  the  relation  of  the  variables  in  a 
single  data  point.  Trees  and  castles  depend  on  the  hierar¬ 
chical  clustering  of  the  variables;  such  clustering  will  vary 
with  the  data  sets  and  may  not  be  desirable  in  some  in¬ 
stances. 
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The  polygons  used  in  comparative  studies  have  varied  in 


their  visual  features;  none  of  these  studies  has  tried  to 
determine  their  optimal  c h a r ac t e r i s t i c s  .  Because  of  their 
simplicity,  ease  of  use,  and  their  applications  in  exper¬ 
imental  tasks,  process  monitoring,  and  representation  for 
decision  making,  further  investigation  of  polygons  seemed 
justified.  By  identifying  the  characteristics  which  lead  to 
better  performance,  polygons  can  be  employed  more  efficiently 
ana  more  validly  compared  to  other  display  types. 

Of  the  tasks  for  which  point  representations  are  gener¬ 
ally  recommended,  it  was  decided  to  investigate  the  perform¬ 
ance  in  a  clustering  task.  This  task  is  one  that  has  various 
applications,  is  related  to  other  tasks  for  which  polygons 
are  suited,  and  has  not  been  investigated  in  detail  with 
polygons.  Given  a  set  of  objects  for  which  various  attri¬ 
butes  are  measured,  one  divides  the  objects  into  groups  on 
the  basis  of  their  similarity.  Such  a  task  may  often  be  an 
initial  step  in  data  analysis.  The  hotel  occupancy  example 
from  Bertin,  mentioned  earlier,  illustrates  the  use  of  this 
technique  in  analyzing  data  as  part  of  a  decision  process; 
plant  location,  material  handling  alternatives,  or  similar 
problems  with  multiple  variables  could  be  handled  in  this 
manner.  Clustering  is  related  to  categorization ,  and  as  such 
could  find  application  in  tasks  which  require  one  to  cate¬ 
gorize  a  new  object  into  one  of  several  classes  on  the  basis 
of  its  similarity  to  a  typical  member  of  the  class.  Iden- 


16 


tification  tasks  or  status  displays  might  be  considered  re¬ 
lated  to  this  task.  Finally,  polygon  displays  find 
application  in  data  presentation,  to  support  or  explain  a 
clustering  or  categorization  task. 

Clustering  and  categorization  have  been  of  interest  in 
both  cognitive  nsychology  and  statistics.  The  terms  cat¬ 
egorization  and  classification  are  at  times  used  almost 
interchangeably  in  the  psychological  literature  (e.g.,  Reed, 
1972),  but  here  categorization  will  be  used  to  denote  such  a 
task.  Classification  will  be  used  to  denote  the  process  of 
forming  a  hierarchy  of  objects  or  groups  of  objects,  as  a 
taxonomy.  The  two  processes  are  not  exactly  the  same,  though 
they  are  related.  At  a  particular  level  of  a  classification 
hierarchy,  one  would  categorize  objects  into  groups. 

Categorization  seems  to  be  basic  to  an  organism's  in¬ 
formation  processing.  Rosch  C1978)  points  out  two  principles 
underlying  the  process,  the  need  for  some  efficient  or  eco¬ 
nomical  way  to  deal  with  the  complexity  of  information  in  the 
world  and  the  view  that  the  world  of  stimuli  has  a  correlated 
structure.  These  principles,  especially  the  first,  are  noted 
by  most  in  discussing  categorization.  The  psychological 
theories  of  the  process  are  varied;  Anderson  (1980)  has  a 
good  discussion  of  the  various  theories  and  some  of  the 
problems  with  each.  At  present  some  form  of  prototype  or 
schema  theory  seems  to  be  the  most  widely  supported,  depend¬ 
ing,  in  part,  on  the  definition  of  prototype  or  schema.  Ac- 


cording  to  this  theory  some  concept  of  a  typical  member  of 
each  category  is  formed  by  the  observer,  whether  as  a  full 
member  or  as  a  set  of  rules  for  membership,  and  new  stimuli 
are  then  categorized  on  the  basis  of  these  prototypes  or 
schemata . 

In  categorizing  a  group  of  stimuli,  it  is  generally  as¬ 
sumed  that  the  categories  are  formed  in  such  a  manner  as  to 
retain  the  maximum  similarity  between  members  of  the  category 
and  maximum  difference  between  contrasting  categories  (Rosch, 
1978;  Tversky  and  Gati,  1978).  Similarity  judgment  has  been 
modelled  on  the  basis  of  a  multidimensional  geometric  space, 
similarity  being  related  to  some  distance  measure  when  stim¬ 
uli  are  considered  points  in  this  space.  Reed  (1972),  for 
example,  studied  a  number  of  different  distance  metrics  as 
the  basis  for  categorization  of  visual  stimuli,  cartoon  faces 
like  Chernoff  faces.  Recently  several  other  models  have  been 
proposed.  The  cue  validity  model  (Reed,  1972)  Rosch,  1978) 
relates  similarity  to  a  function  of  the  conditional  proba¬ 
bilities  of  category  membership  for  each  aspect  of  the  stim¬ 
ulus.  Tversky  (1977)  has  proposed  a  model,  termed  the 
contrast  model,  in  which  similarity  is  a  linear  function  of 
the  common  and  distinct  aspects  of  the  stimuli.  These  more 
recent  models  were  developed  to  try  to  explain  certain  prob¬ 
lems  in  the  purely  geometric  models,  such  as  the  asymmetry 
of  judgments  depending  upon  the  direction  of  comparison  and 
the  role  of  context.  In  general,  though,  the  geometric  model 
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still  provides  a  reasonable  approximation  for  many  tasks  >  as 
Tversky  (Tversky  and  G  a  t  i  ,  1978)  points  out. 

In  the  field  of  statistics  a  group  of  procedures  for 
categorizing  objects  has  been  developed)  known  most  commonly 
as  cluster  analysis.  Introductions  to  these  techniques  can 
be  found  in  various  multivariate  texts  (e.g.»  Dillon  and 
Goldstein,  1984)  and  in  several  single  volumes  devoted  to  the 
subject  (Eve-itt  )  1974  ;  Hartigan (  1976).  The  techniques  be¬ 

gin  with  a  distance  or  similarity  matrix  of  the  various  ob¬ 
jects  to  be  clustered.  Two  basic  methods  of  finding  clusters 
are  used.  In  hierarchical  techniques  the  objects  are  pro¬ 
gressively  merged  on  the  basis  of  some  metric,  adding  members 
and  combining  clusters  until  one  cluster  is  finally  formed; 
alternatively,  some  hierarchical  procedures  begin  with  one 
cluster,  progressive!'/  dividing  the  clusters.  The  other 
group  of  techniques  includes  those  which  partition  the  ob¬ 
jects  into  a  predetermined  number  of  mutually  exclusive 
clusters  by  minimizing  some  within  cluster  metric  and  maxi¬ 
mizing  some  between  cluster  metric.  There  is  a  variety  of 
clustering  methods,  and  each  has  its  own  advantages  and 
problems.  The  results  from  different  techniques  often  vary. 

It  has  been  suggested  that  point  representations  such 
as  polygons  might  be  used  for  rough  clustering  or  approxi¬ 
mating  the  number  of  clusters  in  an  exploratory  look  at  the 
data,  as  well  as  for  presentation  of  the  patterns  in  the  data 
Ce.g.,  Everitt,  1978;  Dillon  and  Goldstein,  1984).  Such  a 
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graphic  technique  could  be  used  by  individuals  not  well 
versed  in  the  arcana  of  matrix  algebra  and  multivariate 
analysis  techniques.  For  polygons  to  be  effectively  used  in 
tasks  such  as  clustering  they  should  be  able  to  be  consist¬ 
ently  clustered.  Visual  variables  which  encourage  such  con¬ 
sistency  should  be  employed  when  designing  such  a  display. 
The  Experiment 

The  Display.  Although  some  software  packages  will 
produce  polygons,  SAS/GRAPH  (Statistical  Analysis  System)  for 
example,  the  variations  for  this  investigation  necessitated 
writing  programs  to  produce  the  displays.  Figures  were 
plotted  using  a  Versatec  1200  electrostatic  plotter,  then 
separated  so  that  a  single  polygon  appeared  on  each  three  by 
four  inch  (7.6  x  10.2  cm.)  card,  with  a  title  and  number 
below.  This  legend  provided  a  base  to  orient  the  figure  and 
allowed  keeping  track  of  the  cluster  membership  (see  Figure 
2  )  . 

Certain  aspects,  actually  "dimensions"  to  use  Garner's 
(1970,  1978)  terminology,  of  the  graphic  were  varied  to  de¬ 

termine  whether  changes  in  them  would  affect  the  consistency 
of  clustering.  the  first  visual  aspect  to  be  varied  was  the 
shading  of  the  figure.  It  was  hypothesized  that  a  shaded 
figure  would  be  perceived  more  readily  as  a  whole  than  an 
outline  and  be  more  consistently  clustered.  The  reported 
studies  have  all  used  outline  figures. 
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The  amount  of  additional  information  displayed  with  each 
polygon  was  thought  to  interact  with  the  tasks  for  which  it 
is  used.  Goldsmith  and  Schvaneveldt  (1982)  noted  an  increase 
in  performance  on  an  information  integration  task  with  a 
polygon  display  which  included  internal  radii.  For  the  task 
of  clustering  »  though,  one  might  expect  that  excess  informa¬ 
tion  would  detract  from  the  comparisons  by  shape.  Three 
levels  of  additional  information  were  used  in  this  study. 

The  first  was  no  additional  information;  the  polygon  alone 
was  presented.  The  second  level  included  a  background  circle 
through  the  means  of  each  variable  and  internal  radii.  The 
final  level  included  a  circular,  polar  grid,  against  which 
the  polygon  was  presented,  including  internal  radii  (see 
Figure  3 ) . 

The  task  of  clustering  is  one  of  grouping  the  figures 
on  the  bases  of  their  perceived  similarity  or  dissimilarity. 
The  variation  between  polygons,  of  course,  is  created  by  the 
differences  an  vector  length  of  the  radii;  judgments  of 
length  may  play  a  role  in  the  comparisons.  It  might  be  ex¬ 
pected,  then,  that  accentuating  this  length  would  be  benefi¬ 
cial  to  performing  the  clustering  task.  On  the  other  hand, 
the  grouping  task  may  be  performed  more  on  the  perception  of 
the  overall  shape  or  pattern,  in  which  case  an  accentuated 
figure  would  prove  no  easier  to  group  than  the  standard 
polygon.  To  investigate  the  relation  between  form  and  per¬ 
formance,  polygons  were  presented  in  two  different  forms,  the 
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standard  polygon  and  an  accentuated  form*  the  star.  By  con¬ 


necting  each  value  point  on  the  radius  with  a  small  base 
circle  mid-way  between  each  pair  of  radii,  the  length  along 
the  radius  is  accentuated.  The  rays  of  the  star  were  ap¬ 
proximately  the  same  length  as  the  radii  of  the  corresponding 
standard  polygon,  but  the  attribute  of  length  will  be  visu¬ 
ally  emphasized  (see  Figure  4). 

Two  data  sets  were  selected  for  use  in  the  experiment 
(see  Tables  1  and  2).  The  first  is  a  subset  of  data  used 
in  examples  in  Chambers  ,  Cleveland,  Kleiner,  and  Tukey 
(1983);  the  second  came  from  McDonald  and  Ayers  (1978). 
Subsets  of  the  variables  were  chosen,  nine  in  each  case,  to 
give  reasonably  complex  polygons  without  being  too  complex. 

I  have  seen  no  reported  studies  of  the  relation  between  the 
number  of  variables  portrayed  with  polygons  and  performance, 
but  it  would  seem  intuitive  that  there  is  an  upper  limit. 
Representations  with  very  few  variables  also  seem  more  dif¬ 
ficult  to  cluster.  The  number  of  variables,  of  course,  would 
be  related  to  the  perceived  complexity  of  the  figures; 
Attneave  (1957)  found  that  degree  of  perceived  complexity  was 
primarily  determined  by  the  number  of  turns  in  a  figure,  but 
also  by  the  angular  variability,  which  would  be  a  function 
of  the  juxtaposition  of  variables  and  their  value  in  each 
individual  case.  The  variables  were  selected  to  try  to  pre¬ 
serve  some  of  the  relationships  in  the  full  data  sets,  though 
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the  decision  was,  at  times,  arbitrary.  Basically,  a  workable 


subset  of  the  full  data  sets  was  sought. 

A  group  of  between  20  and  30  objects  was  chosen  which 
seemed  to  divide  reasonably  well  into  a  small  number  of 
clusters,  as  determined  by  pretesting.  Enough  objects  were 
included  to  make  the  task  non-trivial  and  still  allow  sub¬ 
jects  to  complete  four  clustering  tasks  in  about  an  hour. 

All  manipulations  and  decisions  on  the  data  sets  were  made 
using  polygon  displays  generated  in  the  same  manner  as  those 
for  the  experimental  task. 

Earlier  studies  using  icons  in  categorization  tasks  have 
used  artificially  generated  data.  Jacob  (197£;  Jacob  et  al., 
1976)  had  subjects  categorize  stimuli  with  various  icon  rep¬ 
resentations  from  randomly  generated  data.  Chernoff  and 
Rizvi  (1975)  and  Wilkinson  (1982)  also  used  random  data  in 
cat  ego r iza t i on  and  similarity  judgment  tasks)  F r en i -T i t u 1 ae r 
and  Louv  (1984)  generated  special  data  sets,  varying  specific 
parameters.  It  is  certainly  easier  to  study  the  effects  of 
various  types  of  data  on  the  task  with  generated  sets,  but 
in  an  exploratory  task,  for  which  icons  are  often  recom¬ 
mended,  one  probably  won't  know  what  parameters  characterize 
the  data.  Thus  it  was  felt  that  real  data  would  at  least  have 
face  validity. 

A  second  factor  with  randomly  generated  data  is  related 
to  its  use  as  the  criterion  with  accuracy  as  the  measure. 
This  issue  is  at  the  basis  of  judging  categorization.  If  we 
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generate  two  data  sets  with  a  pseudo-random  number  generator, 
about  two  distinct  points  in  multidimensional  space,  such 
that  there  is  no  overlap,  can  we  rightly  say  that  the  sub¬ 
jects  "miscategorized"  the  points  if  they  don't  reproduce  the 
geometric  clusters?  Geometry  may  not  be  the  appropriate 
measure  for  pattern  similarity  of  figures,  nor  for  similari¬ 
ties  in  data.  This  problem  is,  to  some  extent,  a  part  of  the 
difficulties  that  cluster  analysis  faces. 

Two  aspects  of  the  subjects'  clusterings  were  used  as 
measures  of  performance.  The  first  was  the  agreement  of  the 
clusterings  with  some  standard  clustering  of  the  data.  With 
generated  data  the  standard  would  be  provided  from  the  algo¬ 
rithm  for  generating  the  data.  Since  "real"  data  were  being 
used  as  the  basis  of  the  displays,  these  data  sets  were  sub¬ 
jected  to  cluster  analysis  techniques  to  derive  the  stand¬ 
ards.  Differences  between  the  standard  and  the  subjects' 
results  became  the  metric. 

There  are  numerous  cluster  analysis  techniques  available 
and  the  results  of  different  techniques  are  not  always  con¬ 
sistent.  For  any  graphic  clustering  method  to  be  reliable, 
though,  it  should  be  reasonably  consistent  across  individ¬ 
uals,  as  well  as  with  the  same  individual  across  time.  Vis¬ 
ual  aspects  of  the  display  which  show  more  consistent 
performance  should  be  identified.  The  second  measure  of 
performance  was  consistency  across  the  subjects  at  each  level 
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of  the  variables.  Differences  between  the  subjects'  clus¬ 


terings  were  used  as  a  metric  of  this  consistency. 

Design .  The  three  visual  variables  of  the  display  and 
data  set  were  combined  ina2x3x2x2  full  factorial  ex¬ 
perimental  design  (see  Figure  5).  The  variables  of  addi¬ 
tional  information  and  shading  were  be t we en- sub j ec t s 
variables.  In  order  to  make  best  use  of  subjects  and  control 
for  some  of  the  expected  variation,  the  variable  of  form  was 
treated  as  a  wi t h i n - sub j e c t  variable.  Pretesting  indicated 
that  this  was  feasible.  The  titles  of  the  data  sets,  printed 
at  the  bottom  of  each  polygon,  were  interchanged  between 
forms,  and  the  order  of  the  polygons  varied.  Since  both  data 
sets  were  seen  by  each  subject,  data  set  was  also  a  within- 
subject  variable.  Each  subject  thus  saw  four  sets  of  fig¬ 
ures,  both  data  sets  at  both  levels  of  form. 

Subi ects .  A  broad  spectrum  of  subjects  participated  in 
the  experiment,  36  in  all.  They  came  from  the  academic  com¬ 
munity  and  had  at  least  some  post-secondary  education.  They 
ranged  in  age  and  occupation,  from  students  to  middle-aged 
professionals.  It  was  felt  that  such  an  exploratory  exper¬ 
iment  as  this  should  include  a  range  of  subjects  and  not  be 
limited  to  a  specific  expertise.  Through  informal  inquiry 
it  was  learned  that  none  of  the  subjects  had  seen  this  type 
of  graphic  display. 

Procedure ■  The  experiment  was  run  over  a  two  and  a  half 
week  period,  all  sessions  being  in  the  afternoon  or  early 
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City 


Figure  5.  Experimental  design. 
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evening.  A  classroom  at  Virginia  Polytechnic  Institute  and 
State  University  was  used  for  all  sessions,  providing  a 
quiet,  we  1 1 -  1 i g h t e d  ,  and  comfortable  place  to  work.  If  more 
than  one  subject  participated  at  a  time,  sessions  were 
staggered  . 

Each  session  began  with  the  subject  reading  the  consent 
form.  A  set  of  instructions  was  then  given  which  provided 
a  brief  overview  and  background  of  the  display  and  its  use, 
an  illustrative  example,  and  the  actual  instructions  for  the 
task  (see  Appendix).  The  shading  of  the  figures  in  the  in¬ 
structions  corresponded  to  that  w  h  i  £  h  the  subjects  would  be 
given;  the  form,  with  that  which  the  subjects  would  see 
first.  After  it  was  determined  that  the  subjects  had  read 
and  understood  the  instructions,  the  four  sets  of  figures 
were  presented,  one  set  at  a  time.  Subjects  were  requested 
to  initially  spread  all  the  figures  out  before  beginning  the 
clustering;  the  figures  were  presented  in  numerical  order  in 
a  pack.  On  top  of  the  pack  was  a  card  indicating  the  number 
of  groups  into  which  to  divide  the  figures  and  explaining  the 
background  (level  of  additional  information )  .  The  order  of 
presentation  of  the  levels  of  additional  information  and  of 
shading  was  balanced  throughout  the  experiment  by  rotating 
through  the  six  combinations.  The  initial  order  was  deter¬ 
mined  randomly.  The  order  of  form  and  data  set  was  also 
balanced . 
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Training  of  the  subjects  was  considered,  but  training 


rust  have  a  well-defined  criterion  from  which  feedback  can 
be  provided.  Several  of  the  studies  of  graphics  in  educa¬ 
tion,  mentioned  earlier,  have  commented  on  the  fact  that  the 
graphic  perception  of  ideas  is  not  necessarily  intuitive,  and 
Kruskal  (1982)  comments  on  the  need  for  more  work  relating 
to  graphic  interpretation.  This  study  was  intended  as  ex¬ 
ploratory,  to  investigate  several  aspects  of  polygon  displays 
in  relation  to  clustering  with  a  fairly  broad  range  of  sub¬ 
ject  backgrounds.  It  was  intended  to  look  at  the  subjects' 
existing  ability  to  perceive  pattern  and  judge  similarity. 

In  addition,  a  specific  criterion  to  which  subjects  could  be 
trained  was  lacking.  There  was  no  training  of  subjects 
undertaken  beyond  the  explanation  and  example  given  in  the 
instructions  . 

An  effort  was  made  to  keep  the  sessions  informal,  though 
serious,  and  to  answer  any  questions  which  might  arise.  At 
the  end  of  each  session,  any  further  questions  were  answered, 
along  with  discussion  of  possible  applications  or  problems, 
if  ^he  subject  was  interested.  Many  of  the  subjects  ex¬ 
pressed  interest  in  the  display  and  some,  ways  it  might  be 
used.  Sessions  ranged  in  length  from  about  95  to  80  minutes. 
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CHAPTER  THREE  --  EXPERIMENTAL  RESULTS  AND  DISCUSSION 
Frequency  Investigation 

As  a  preliminary  step  to  the  analysis  of  the  effects 
of  the  visual  variables  used  in  this  study,  the  full  results 
of  the  subjects'  clusterings  were  reviewed  by  observing  the 
frequency  of  the  data  points  being  clustered  together.  This 
analysis  would  allow  a  look  at  the  overall  pattern  of  clus¬ 
tering.  A  frequency  count  indicating  how  often  each  pair  of 
objects  appeared  in  the  same  cluster  was  made  (see  Tables  3 
and  4).  The  range  of  values  was  very  broad,  as  can  be  seen 

from  the  tables.  Of  the  72  times  each  pair  of  data  points 

» 

appeared,  for  the  car  data  the  frequencies  ranged  from  71  to 
zero;  the  city  data  had  a  similar  range.  From  these  fre¬ 
quency  matrices  the  clusters  of  points  found  most  frequently 
together  were  determined  by  a  heuristic  method.  The  pairs 
were  ordered  by  frequency;  then  by  descending  through  the 
order,  members  were  added  to  form  groups  on  the  basis  of 
their  high  frequency  with  other  group  members  and  low  fre¬ 
quency  with  members  of  the  other  groups.  These  groups  are 
shown  in  Figure  6  . 

Closer  examination  of  the  matrix  for  car  data  shows  one 
group  of  cars  which  was  found  together  very  often;  these  are 
the  subcompacts.  The  average  frequency  for  pairs  in  this 
group  was  67.54  ( s.  =  2.08);  the  other  two  groups  showed  lower 


averages  and  higher  deviations  (for  the  group  labeled 
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For  City  data 


A  Akron,  Albany,  Canton,  Cleveland,  Grand  Rapids, 
Kansas  City,  Columbus,  Dayton,  Flint,  Rochester, 
Youngstown,  Indianapolis,  New  Haven 

B  Atlanta,  Greensboro,  Houston,  Richmond 

C  Birmingham,  Chattanooga,  Dallas,  Ft.  Worth, 

Memphis,  Nashville,  New  Orleans 

D  Los  Angeles,  San  Diego,  San  Jose,  San  Francisco 


For  Car  Data  -- 

A  Cad.  Eldorado,  Dodge  Diplomat,  Lincoln  Cont.,  Cad. 
Seville,  Dodge  St.  Regis,  Lincoln  Versailles, 

Olds.  Toronado  ,  Pont.  Grand  Prix 

B  Chev.  Malibu,  Buick  Century,  Merc.  Zephyr,  Olds. 
Cutlass,  Merc.  Marquis,  Olds  98,  Buick  Electra, 
Pont.  Catalina 

C  Datsun  210,  Toyota  Corolla,  Dodge  Colt,  Honda 
Civic,  Mazda  GLC,  Subaru,  Ford  Fiesta, 

Plym.  Champ 


Figure  6.  Clusters  on  the  basis  of  frequency  of  pairs. 


A ,  average  =  39.14*  s.  =  12.59;  for  B,  average  =  46.21,  .s  = 

11.6). 

For  the  city  data  one  group  was  most  frequently  clus¬ 
tered;  labeled  D,  it  had  an  average  frequency  for  pairs  of 
60.17  (s.  =  4.54).  These  were  all  California  cities.  The 
other  groups  had  lower  averages  and  more  variation.  For  the 
Northern  and  Mid-West  cities*  A*  the  average  was  48.05  (s  = 
7.70);  the  averages  for  the  two  groups  of  Southern  cities 
were,  for  B,  38.5  (.§.  =  13.00)  and  C,  43.29  Cs  =  8.48). 

Since  all  the  pre-experlmental  work  with  the  data  sets 
had  been  done  using  graphic  representations,  the  data  sets 
used  and  the  clusters  found  based  on  the  frequency  analysis 
were  analyzed  by  several  more  traditional  statistical  tech¬ 
niques  to  observe  the  structure  of  the  data  sets. 

A  principal  components  analysis  of  both  data  sets  was 
conducted  to  reduce  the  dimensionality  of  the  data.  Princi¬ 
pal  components  analysis  finds  the  orthogonal  linear  transf¬ 
ormations  of  the  variables  which  account  for  most  of  the 
variance  in  the  original  data.  The  eigenvectors  o*  the  co- 
variance  matrix  are  used  for  this  transformation.  The  tech¬ 
nique  is  often  used  to  reduce  the  dimensionality  of  the  data 
by  finding  certain  principal  factors  or  components  which  ac¬ 
count  for  most  of  the  variance  in  the  data,  though  interpre¬ 
tation  of  the  derived  components  is  not  always  clear.  This 
reduction  of  dimensionality  can  sometimes  also  be  used  to 
graphically  portray  the  overall  structure  of  the  data.  By 
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plotting  the  scores  of  the  major  principal  components  against 
each  other  one  may  be  able  to  get  a  two  dimensional  repre¬ 
sentation  of  the  distribution  of  the  data.  Clusters  appear 
as  groups  of  points  when  the  data  are  thus  projected  onto  a 
plane.  Using  the  clusters  from  the  frequency  analysis  as 
symbols*  the  principal  components  were  plotted  for  both  sets 
of  data.  For  the  city  data  the  plots  of  the  first  and  second 
principal  components  and  of  the  first  and  the  third  are  shown 
(Figure  7).  That  for  the  second  and  thirh  is  similar,  but 
more  uniformly  distributed. 

These  plots  show  that  there  is  not  a  simple  structure 
in  this  data  set,  that  the  clusters  aren't  nicely  separated 
or  "natural."  The  group  designated  with  A,  the  Northern  and 
Mid-West  cities,  does  seem  to  form  a  grouping,  and  is  some¬ 
what  separated,  especially  in  the  second  plot.  The  two 
groups  of  Southern  cities,  B  and  C,  are  not  well  separated. 
The  four  cities  identified  by  D,  the  California  cities,  ap¬ 
pear  spread  out,  apart  from  the  other  points.  These  four 
points  are  the  ones  with  which  the  mathematical  clustering 
methods,  discussed  later,  had  trouble;  such  techniques  are 
sensitive  to  outliers  (Dillon  and  Goldstein,  1984). 

Considering  these  four  points  more  closely  leads  to  an 
interesting  aspect  of  performance  with  polygon  type  displays 
(see  Figure  8).  The  frequency  analysis  indicated  that  pairs 
of  these  four  points  were  most  commonly  grouped  together. 
Comments  made  by  some  of  the  subjects  indicated  that  they  had 
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given  special  consideration  to  the  spiked  shape  at  the  top 
of  these  figures;  observation  of  subjects  revealed  that  these 
four  were  often  the  first  grouped .  The  fact  that  these 
cities  were  grouped  together  so  frequently  may  be  account-cd 
for  by  considering  Tversky'.  hypothesis  of  asymmetry  in  sim¬ 
ilarity  judgments  (1977;  Tversky  and  Gati,  1978).  People 
tend  to  overemphasize  similar  aspects  of  stimuli  when  making 
similarity  judgments  ,  as  subjects  were  here  requested  to 
perform.  Some  subjects  indicated  in  remarks  that  they  had 
used  distinctive  aspects  to  form  other  groups. 

There  are  two  implications  of  this  observation  for  the 
use  of  polygon  displays  for  clustering  or  for  categorization. 
In  effect  the  subjects  weighted  certain  variables  in  their 
clustering.  This  weighting  would  be  related  to  the  order  of 
presentation  of  the  variables,  which  produces  the  distinctive 
pattern.  Reordering  the  variables  might  bring  different 
patterns  to  the  fore,  and,  of  course,  in  exploratory  analy¬ 
sis,  one  would  probably  not  know  which  variables  are  likely 
to  produce  such  patterns.  For  categorization  tasks  where  the 
weighting  would  hinder  the  accuracy,  the  display  design  and 
ordering  of  variable  presentation  should  take  this  phenomenon 
into  account. 

On  the  other  hand,  observing  such  patterns  in  the  data 
is  one  of  the  purposes  of  graphic  analysis.  In  this  exper¬ 
iment  most  subjects  picked  out  the  only  California  cities 
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from  the  set*  something  the  mathematical  algorithms  had 
problems  with. 

For  the  car  data  the  principal  components  plot  shows  one 
fairly  distinct  group  (indicated  with  C,  Figure  9).  The 
other  data  points  are  fairly  spread  out  and  there  is  overlap 
in  the  groups  identified  by  the  subjects. 

Everitt  (1978)  proposes  plotting  the  canonical 
discriminant  scores  from  graphic  clustering  to  observe  sepa¬ 
ration.  Canonical  discriminant  (variate)  analysis  seeks  to 
find  orthogonal  linear  combinations  of  the  variables  of  the 
data  for  which  category  membership  is  already  known.  The 
weights  for  these  functions  are  based  on  the  ratio  of  the 
between  sums  of  squares  matrix  and  the  within*  the 
eigenvectors  of  the  product  of  the  two  matrices  are  useC. 

The  canonical  variate  scores  can  then  be  plotted  against  each 
other  to  produce  a  two-dimensional  representation  of  the 
discrimination  between  the  groups.  Since  mathematical  clus¬ 
tering  techniques  find  clusters  on  the  basis  of  maximizing 
separation  by  manipulating  these  matrices*  canonical 
discriminant  analysis  can  not  be  used  to  evaluate  such  re¬ 
sults.  With  graphic  clusterings*  though,  there  may  be  some 
rough  validity  in  performing  such  an  evaluation.  If  subjects 
were  actually  grouping  random  points,  the  analysis  could  not 
find  a  good  transformation  and  the  plot  would  not  show  any 
separation.  The  plats  show  some  separation,  as  did  the 
principal  components  plots  (Figure  10). 
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The  preliminary  investigation  of  frequency  confirms  that 
subjects  using  polygon  type  displays  can  generally  find 
groups  of  similar  objects  in  a  data  set,  clusters  if  they 
exist,  as  some  earlier  studies  which  used  polygons  to  study 
similarity  judgments  have  assumed  (Rosch,  1978;  Tversky  and 
Gati,  1978).  In  this  study  subjects  seem  to  have  done  well 
finding  fairly  distinct  groups  in  the  data  sets  used;  as  the 
data  became  more  uniform,  the  membership  of  the  groups  found 
varies  more.  Further,  subjects  showed  a  fairly  good  ability 
to  pick  out  points  which  were  unusual  in  the  data;  outlier 
identification  is  one  of  the  recommended  uses  of  point  rep¬ 
resentation  graphics.  One  potential  problem  with  such  dis¬ 
plays  was  noted,  though;  subjects  did  seem  to  weight  their 
judgments  depending  on  the  distinctive  characteristics  of  the 
representation .  This  tendency  may  affect  the  clusterings 
found,  especially  when  compared  with  statistical  techniques. 
Analysis  of  Differences 

The  primary  analysis  of  the  results  of  the  experiment 
was  designed  to  investigate  the  consistency  of  the  results 
of  the  clustering  task  by  using  a  measure  of  differences  -- 
first,  those  from  standard  cluster  analysis  techniques;  then 
those  between  subjects  at  every  level  of  the  variables.  The 
results  of  each  subject's  clustering  were  tabulated  and  the 
difference  scores  were  figured  by  comparing  each  cluster  from 
a  subject's  results  with  the  corresponding  cluster  of  the 
standard  or  another  subject's  resdlts.  Since  the  identifi- 
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cation  of  an  individual  cluster  is  determined  by  its  elements 
and  not  some  external  attribute,  the  corresponding  clusters 
were  considered  to  be  those  with  the  least  number  of  differ¬ 
ences  and  the  total  of  the  cluster  differences  was  used  as 
the  difference  score  for  that  pair.  The  difference  scores 
were  adjusted  to  take  into  account  the  difference  in  the  size 
of  the  two  data  sets  used  in  the  experiment.  Difference 
scores  for  the  car  data  set  were  divided  by  32;  for  the  city 
data,  by  42.  These  adjustment  factors  constitute  the  maxi m urn 
possible  difference  scores  determined  by  comparisons  of  ran¬ 
domly  generated  clusters. 

For  investigating  consistency  with  standard  algorithms, 
it  was  felt  that  a  comparison  should  be  made  with  each  of  the 
two  major  types  of  clustering  techniques,  hierarchical  and 
partition.  The  choice  of  specific  techniques  was  based  pri¬ 
marily  on  their  availability  and  common  use.  Those  in 
standard  statistical  packages  are  the  ones  most  likely  to  be 
employed  and  the  ones  considered.  For  a  partitioning  method, 
the  K-Means  technique  was  selected;  it  is  the  basis  of  the 
partitioning  procedure  in  BMDP  (1983)  and  the  SAS  (1985) 
partitioning  procedure,  Fastclus,  is  partially  modelled  on 
it.  This  technique  finds  clusters  by  rearranging  the  mem¬ 
bership  until  the  error  component,  defined  as  the  di-  ance 
between  cluster  members  and  the  cluster  mean,  is  minimized 
(see  Dillon  and  Goldstein,  1984;  Hartigan,  1975).  As  with 
other  partition  methods,  membership  of  the  clusters  is  re- 


evaluated  when  members  are  shifted.  In  hierarchical  tech¬ 
niques  membership  is  fixed  when  a  point  is  joined  to  another 
or  to  a  cluster.  Subjects  were  observed  shifting  polygons 
from  one  group  to  another,  re-evaluating  cluster  membership, 
as  a  partition  technique  would  do.  The  partition  procedure 
in  BMDP  was  used  since  that  in  SAS  is  designed  for  large  data 
sets  and  is  sensitive  to  the  order  of  presentation  with 
smaller  sets.  A  list  of  the  members  of  the  clusters  from  this 
procedure  is  given  in  Figure  11. 

For  the  hierarchical  clustering  technique  both  SAS  and 
BMDP  were  used.  For  the  car  data,  all  three  of  the  distance 
metrics  available  on  SAS  gave  the  same  clustering  at  the 
three  cluster  level,  and  BMDP  results  were  similar  (Figure 
12).  The  dendrogram  derived  from  that  produced  by  BMDP  shows 
the  clusterings  found  at  each  level  of  joining  members  or 
groups;  the  membership  at  a  given  number  of  clusters  can  be 
determined  from  the  branches  by  running  a  line  across  the 
dendrogram  at  the  appropriate  place. 

For  the  city  data  set  Ward's  method  was  selected  for  use 
to  determine  distances  (see  Figure  13).  This  data  set  con¬ 
tained  four  points  which  were  somewhat  unique.  Ward's  method 
joins  objects  on  the  basis  of  the  least  increase  in  the  error 
sums  of  squares  (see  Dillon  and  Goldstein,  1984;  Everitt, 
1974).  The  choice  of  an  appropriate  clustering  technique  is 
always  a  matter  of  some  concern  (see  Everitt,  1974,  and  brief 
discussion  and  examples  with  different  types  of  data  in  SAS, 
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For  City  data  -- 

A  Akron,  Albany,  Canton,  Cleveland,  Grand  Rapids, 
Kansas  City,  Columbus,  Dayton,  Flint,  Rochester, 
Youngstown,  Indianapolis,  New  Haven 

B  Atlanta,  Birmingham,  Chattanooga,  Greensboro, 

Houston,  Memphis,  Nashville,  New  Orleans,  Richmond 

C  Dallas,  Ft.  Worth,  San  Diego,  San  Jose 

D  Los  Angeles,  San  Francisco 


For  Car  Data  -- 

A  Cad.  Eldorado,  Dodge  Diplomat,  Cad.  Seville, 
Lincoln  Versailles,  Olds.  Toronado,  Olds.  98, 
Buick  Electra 

B  Chev.  Malibu,  Lincoln  Cont . ,  Buick  Century, 

Dodge  St.  Regis,  Merc.  Zephyr,  Olds.  Cutlass, 
Merc.  Marquis,  Pont.  Catalina,  Pont.  Grand  Prix 

C  Datsun  210,  Toyota  Corolla,  Dodge  Colt,  Honda 
Civic,  Mazda  GLC,  Subaru,  Ford  Fiesta, 

Plym.  Champ 


Figure  11.  Clusters  on  the  basis  of  K-Means  technique. 
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1984).  Both  the  centroid  and  average  link  measures  led  to 


chaining,  a  predilection  to  join  single  points  to  already 
existing  clusters,  and  several  points  remained  separate  un¬ 
til  late  in  the  clustering  procedure;  at  the  level  of  four 
clusters  there  were  two  large  clusters  and  two  clusters  with 
one  each,  California  cities.  Ward’s  method  tended  to  give 
more  distinct  clusters  with  these  data.  Figure  14  contains 
the  clusterings  from  the  hierarchical  procedure  for  both  data 
sets. 

As  is  not  uncommon  in  using  cluster  analysis  techniques 
in  their  present  state  of  development,  there  were  different 
clusterings  from  the  different  methods.  This  inherent  in¬ 
consistency  in  the  results  from  these  techniques  and  the 
problems  in  assessing  the  relative  appropriateness  of  the 
various  results  would  make  the  use  of  a  particular  technique 
questionable  as  a  criterion  for  training. 

The  analysis  of  differences  in  this  study  was  not  in¬ 
tended  to  be  used  as  an  investigation  of  which  clustering 
algorithm  more  closely  approximates  the  way  people  categorize 
geometric  patterns  or  the  converse,  but  rather  to  use  com¬ 
monly  available  statistical  procedures  as  a  standard  against 
which  the  effect  of  the  visual  variables  might  be  compared. 
Reed  (1972)  has  already  investigated  the  relationship  between 
categorization  and  certain  mathematical  procedures,  pointing 
out  the  general  ability  of  people  to  abstract  a  prototype 
based  on  some  central  tendency.  There  are  problems  in  such 
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Figure  13 


Dendrogram  from  SAS  for  city  data 


For  City  data 


A  Akron,  Albany,  Canton,  Cleveland,  Grand  Rapids, 
Kansas  City,  Columbus,  Dayton,  Flint,  Rochester, 
Youngstown,  Indianapolis,  New  Haven 

B  Atlanta,  Birmingham,  Chattanooga,  Greensboro, 

Memphis,  Nashville,  New  Orleans,  Richmond 

C  Dallas,  Ft.  Worth,  Houston,  San  Diego,  San  Jose 

D  Los  Angeles,  San  Francisco 


For  Car  Data  -- 

A  Cad.  Eldorado,  Lincoln  Continental,  Cad.  Seville, 
Lincoln  Versailles,  Olds.  Toronado 

B  Chev.  Malibu,  Dodge  Diplomat,  Buick  Century, 

Dodge  St.  Regis,  Merc.  Zephyr,  Olds.  Cutlass, 
Merc.  Marquis,  Olds  98,  Buick  Electra, 

Pont.  Catalina,  Pont.  Grand  Prix 

C  Datsun  210,  Toyota  Corolla,  Dodge  Colt,  Honda 
Civic,  Mazda  GLC,  Subaru,  Ford  Fiesta, 

Plym.  Champ 


Figure  14.  Clusters  on  the  basis  of  hierarchical  tech¬ 
nique. 
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investigations/  though;  as  has  recently  been  pointed  out 
(Rosch ,  1978;  Tversky,  1977;  Tversky  and  Gatiz  1978)/  judg¬ 

ments  of  similarity  and  categorization  may  not  be  best  mod¬ 
elled  by  relationships  based  on  geometric  distances.  Were 
one  to  look  for  a  theoretical  basis  in  an  experiment  such  as 
this  t  some  aspects  of  the  visual  patterns  themselves  probably 
should  be  used/  as  well  as  purely  geometric  distance  tech¬ 
niques.  For  example/  a  measure  might  include  the  ratio  of 
overlap  to  non-overlap  area  for  each  pair  of  figures  (Rosch/ 
1978)/  along  with  some  overall  measure  of  the  angular  vari¬ 
ability  of  the  figure  (Attneave  /  1954,  1957  ). 

To  investigate  the  effects  of  the  visual  variables,  an 
analysis  of  variance  technique  was  employed  (see  Figure  15 
for  the  design  of  the  ANOVA  table  and  error  terms).  Differ¬ 
ence  scores  were  used  as  the  metric.  Three  analyses  were 
completed,  one  for  the  K-Means  comparison,  one  for  the  hi¬ 
erarchical  comparison,  and  one  for  the  comparison  between 
subjects. 

The  first  analysis  was  the  comparison  with  the  results 
of  the  K-Means  clusterings.  The  adjusted  difference  scores 
between  the  subjects'  clusterings  and  those  of  the  K-Mean* 
technique  for  each  cell  are  given  in  Table  5.  These  entries 
are  the  total  in  eacf  cell  of  the  difference  scores  adjusted 
as  described  earlier.  A  summary  of  the  analysis  of  variance 
is  given  in  Table  6. 
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Source  Deo,  of  Freedom  F  Ratio 

( n  =  6  ) 


A  ( add .  inf o  .  )  2 
S  ( shading )  1 
AS  2 
Sub j . /Groups  30 
F  (form)  1 
D  ( data )  1 
FD  1 
AF  2 
AD  2 
SF  1 
SD  1 
ASF  2 
ASD  2 
AF  D  2 
SF  D  1 
ASF  D  2 
F  x  Sub  j  . /Gr .  30 
D  x  Sub j . /Gr .  3 0 
FD  x  Sub j . /Gr .  30 


MS (A)/MS(Subj ./Gr .  ) 

MS ( S ) /MS ( Sub  j ./Gr .  ) 
MS(AS)/MS(Subj ./Gr  .  ) 

MS ( F ) /MS ( F  x  Sub j ./Gr.  ) 

MS ( D )  /  MS ( D  x  Sub j . /Gr . ) 
MS(FD)  /  MS ( f d  x  Subj./Gr.) 
MSCAF ) /MS ( F  x  Subj./Gr.) 

MS ( AD ) /MS ( D  x  Subj./Gr.) 

MS ( SF  )  /MS  C  F  x  Subj./Gr.) 
MS(SD)/MS(D  x  Subj./Gr.) 

MS ( ASF  )  /  MS  C  F  x  Subj./Gr.) 
MS ( ASD ) /MS ( D  x  Subj./Gr.) 

MS ( AF D ) /MS ( FD  x  Subj./Gr.) 

MS  t  SF  D  )  /  MS( FD  x  Subj./Gr.) 
MS ( ASF D ) /MS ( F D  x  Subj./Gr.) 


Figure  15.  Analysis  of  Variance  Summary 
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TABLE  5 


Adjusted  Difference  Totals  from  Comparison  with 
K-Means  Algorithm 


Additional  Information 


Level  1 

( figure  alone  ) 


Level  2 
(with  radii) 


Polygon 

City  Data 

Shaded  3.00 
Non-shd.  2.14 

Car  Data 


Star 


Polygon 


Star 


2.57 

2.43 


2.14 

2.67 


2.05 
3. 10 


Shaded 

Non-shd 


2.44 
2 . 38 


1 . 94 
2. 13 


1 .44 
2 . 25 


1 .63 

2.44 
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Level  3 
(with  grid  ) 


Polygon 


Star 


2.38 
3 .29 


3.  19 
2 . 86 


2.75 

3.25 


2.25 
1 . 94 


TABLE  6  (Part  a) 


ANOVA  Results  from  Comparison  with  K-Means  Algorithm 


Source 

df 

MS 

F 

R 

A  ( add . inf o  .  ) 

2 

0.096 

2.05 

0  .  146 

S  ( shading  J 

1 

0 .066 

1.41 

0.245 

AS 

2 

0 . 088 

1 . 86 

0.172 

Sub  j  . /Groups 

30 

0 . 047 

F  (form) 

1 

0.018 

1.29 

0.265 

D  (data) 

1 

0.173 

5.60 

0 . 025 

FD 

1 

0 . 053 

2.82 

0  .  104 

AF 

2 

0 . 026 

1 . 84 

0  .  177 

AD 

2 

0 .005 

0  .  16 

0.856 

SF 

1 

0 .002 

0  .  16 

0.695 

SD 

1 

0.004 

0 .14 

0 .710 

ASF 

2 

0 . 055 

3.93 

0 . 030 

ASD 

2 

0.013 

0.41 

0.671 

AF  D 

2 

0 . 028 

1.47 

0 . 246 

SF  D 

1 

0 .002 

0  .  12 

0 . 734 

ASF  D 

2 

0 .006 

0.31 

0  .  733 

F  x  Sub j . /Gr . 

30 

0.014 

D  x  Sub j . /Gr . 

30 

0.031 

FD  x  Sub j . /Gr . 

30 

0.019 
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TABLE  6  (Part  b) 


Means  from  Comparison  with  K-Means  Algorithm 


Added  Information 


Shading 


Figure  alone 

0 .40 

Non-s  hd  . 

0.43 

With 

radii 

0.37 

Shd  . 

0 . 39 

With 

grid 

0 . 46 

Form 

Data 

Set 

Polygon 

0 .42 

Car 

0 . 37 

Star 

0.40 

City 

0.44 

Added  Inf or  . 

Shading 

Form 

Figure  alone 

Non 

Polygon 

0 . 38 

Non 

Star 

0 . 38 

Shd  . 

Polygon 

0 . 45 

Shd  . 

Star 

0 . 38 

With  radii 

Non 

Polygon 

0.41 

Non 

Star 

0 .46 

Shd  . 

Polygon 

0 . 30 

Shd. 

Star 

0.31 

With  grid 

Non 

Polygon 

0 . 54 

Non 

Star 

o 

o 

Shd  . 

Polygon 

0.43 

Shd  . 

Star 

0 .45 
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As  can  be  seen  from  the  results  only  the  interaction 
among  the  level  of  background  (additional  information), 
shading,  and  form  was  significant  (£  =  0.030).  Looking  at 
the  means  we  see  that  shaded  stars  or  polygons  were  clustered 
more  consistently  with  the  clustering  of  the  K-Means  tech¬ 
nique  at  level  two  of  background  than  any  of  the  other  com¬ 
binations.  The  two  forms  were  clustered  with  about  the  same 
consistency  at  this  level;  a  simple  effects  test  shows  no 
form  effect  (a  =  0.49).  At  this  level  shading  shows  lower 
means,  though  not  significantly  lower  (£  =  0.06).  Level  two 
consists  of  a  circle  indicating  the  means  and  internal  radii; 
polygons  with  internal  radii  were  found  to  give  better  per¬ 
formance  in  a  display  integration  experiment  (Goldsmith  and 
Schvaneveldt ,  1982).  At  level  three  (grid  pattern)  a  simple 

effects  test  shows  «n  interaction  between  form  and  shading 
(£  =  0.02),  with  shading  appearing  to  help  polygons,  but  hurt 
stars  . 

There  is  a  difference  between  the  data  sets  (£  =  0.025). 
The  car  data  set  was  clustered  more  consistently  with  the 
results  for  the  partition  technique.  A  simple  effects  test 
by  form  shows  that  the  data  set  did  not  effect  the  clustering 
of  polygons  (£  =  0.36),  but  did  effect  stars  (£  =  0.01). 
Polygons  are  the  more  commonly  used  form  of  this  graphic 
display . 

The  adjusted  difference  results  from  the  comparison  with 
the  hierarchical  clustering  technique  are  given  in  Table  7; 
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ANOVA  results  are  in  Table  8.  Here  again  the  data  set  shows 


an  effect  (a  =  0.008);  there  is  also  a  marginally  significant 
( a  =  0.05)  interaction  between  form  and  data  set.  The  car 
data  set  was  more  consistently  clustered  with  the  hierarchi¬ 
cal  standard,  as  it  was  with  the  K-Means.  Simple  effects 
tests  are  also  similar  to  the  K-Means  results,  showing  that 
the  data  set  effect  is  not  significant  for  polygons  (a  = 
0.34),  but  is  for  stars  (a  =  0.002). 

The  interaction  of  background  and  form  also  shows  a 
significant  effect  (a  =  0.015),  with  the  second  level  (the 
means  and  internal  radii  indicated)  again  having  the  least 
mean  differences,  here  with  polygons.  The  simple  effects 
test  shows  a  background  effect  (£  =  0.030)  for  polygons,  with 
all  levels  different  on  the  Student-Newman-Keuls  test.  The 
background  is  not  significant  for  stars  (a  =  0.66).  The 
interaction  of  background,  shading,  and  form,  though,  does 
not  show  a  significant  effect  in  this  analysis. 

The  results  of  the  comparisons  with  the  clusters  found 
by  the  two  cluster  analysis  techniques  are  similar.  When  one 
notes  that  the  cluster  membership  found  by  the  techniques  was 
similar,  the  similarity  of  the  ANOVA  results  is  understand¬ 
able.  These  results  show  a  fairly  high  number  of  differences 
between  the  subjects'  clusterings  and  those  of  the  statis¬ 
tical  techniques;  on  the  car  data  set  the  mean  difference 
score  was  12  (of  32),  and  for  the  city  set,  18  (of  42),  for 
the  K-Means  comparison.  Part  of  these  high  difference  scores 
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TABLE  7 


Adjusted  Difference  Totals  from  Comparison  with  the 
Hierarchical  Algorithm. 


Additional  Information 


Level  1  Level  2  Level  3 

(figure  alone)  (with  radii)  (with  grid) 

Polygon  Star  Polygon  Star  Polygon  Star 

City  Data 


Shaded 

2 . 90 

2  .  A  3 

2 . 24 

2  .  14 

2 . 24 

3.19 

Non-shd  . 

2.19 

2  .  A  3 

2.81 

3 . 33 

3.43 

2 . 86 

Car  Data 


Shaded 

2.81 

2.19 

1.19 

2 .00 

2 .88 

2 .00 

Non-shd  . 

2.63 

1  .  75 

2.13 

2.31 

3.00 

1 .69 

6  1 


TABLE  8  f  Part  a) 

ANOVA  Results  from  Comparison  with  Hierarchical  Algorithm 


Source 

df 

MS 

£ 

R 

A  (add. info.) 

2 

0.052 

1.17 

0 . 324 

S  (shading) 

1 

0.038 

0.86 

36  2 

AS 

2 

0.099 

2.23 

0.126 

Sub  j . /Groups 

30 

0.049 

F  (form) 

1 

0.031 

2.12 

0.156 

0  (data  ; 

1 

0.220 

8.15 

0.008 

FD 

1 

0.074 

4 .27 

0.048 

AF 

2 

0.071 

4 . 85 

0.015 

AD 

2 

0.030 

1  .  08 

0.351 

SF 

1 

0.016 

1.07 

0.309 

SD 

1 

0.015 

0 . 55 

0 . 462 

ASF 

2 

0 .035 

2 . 35 

0  .  113 

ASD 

2 

0.007 

0 . 25 

0  .  782 

A  F  D 

2 

0 . 052 

3 .00 

0  .  065 

SF  D 

1 

0.009 

0.51 

0.483 

ASFD 

2 

0.039 

1  ,  95 

0.160 

F  x  Subj./Gr. 

30 

0.015 

D  x  Sub j . /Gr  . 

30 

0.027 

FD  x  Sub j . /Gr  . 

30 

0.017 
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TABLE  8  (Part  b) 


Means  from  Comparison  with  Hierarchical  Algorithm 


Added  Information 


Shading 


Figure  alone  0.40 
With  radii  0.38 
Witi  grid  0.44 


Non-shd .  0.42 

Shd .  0.39 


Form 


Data  Set 


Polygon  0.42 

Star  0.39 


Car  0.37 

City  0.45 


Added  Information  Form 


Figure  alone 

Polygon 

0 . 44 

Star 

0  .37 

With  radii 

Polygon 

0 . 35 

Star 

0.41 

With  grid 

Polygon 

0 . 48 

Star 

0.41 

Form 

Data  Set 

Polygon 

Car 

0.41 

City 

0 . 44 

Star 

Car 

0.33 

City 

0 . 46 
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may  be  accounted  for  by  the  California  cities,  mentioned 
earlier.  Overall,  though,  the  subjects  were  not  consistent 
with  the  statistical  techniques. 

From  these  two  analyses  it  appears  that  the  variability 
in  performance  noted  in  the  frequency  analysis  overshadows 
the  visual  variables  effects.  It  would  appear  t  h  a  :  the  sub¬ 
jects'  inconsistency  in  making  judgments  of  similarity  be¬ 
tween  figures  combined  with  differences  between  subjects' 
judgments  is  stronger  than  the  changes  in  the  visual  vari¬ 
ables  used  in  this  study.  The  interaction  effects  of  the 
variables  are  also  inconsistent.  The  interaction  of  form  and 
shading  when  against  a  background  grid,  but  the  absence  of 
the  interaction  when  figures  have  the  radii  and  means  indi¬ 
cated,  for  example,  is  difficult  to  explain.  The  effect  of 
data  sets  is  troublesome  when  considering  the  display  for 
exploratory  analysis. 

It  is  also  interesting  to  note  that  the  visually  more 
complex  form,  the  stars,  with  its  higher  angular  variability, 
is  not  significantly  different  from  polygons  in  comparison 
to  the  standard  algorithm  clustering.  While  form  shows  no 
main  effect,  stars  do  seem  to  be  affected  by  differences  in 
data  sets  more  than  polygons. 

The  third  analysis  involves  making  a  pairwise  comparison 
of  the  cluster  membership  in  each  cell  of  the  design  to  as¬ 
sess  the  consistency  with  which  subjects  clustered  the  data 
at  each  level  of  the  variables.  This  analysis  allows  a  com- 
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parison  of  the  effects  of  the  variables  without  resort  to 
some  outside  standard. 

Difference  scores  for  each  pair  of  clusterings  in  each 
cell  of  the  design  were  computed.  The  adjusted  totals  for 
the  cells  are  given  in  Table  9  and  the  ANOVA  results  in  Table 
10  . 

In  this  analysis  the  data  set  again  shows  a  significant 
effect  (£  =  0.0001),  with  the  car  data  being  more  consist¬ 
ently  clustered.  Again  the  means  show  a  high  degree  of  var¬ 
iability  on  both  data  sets.  A  simple  effects  test  indicates 
that  the  data  set  effect  is  significant  for  both  forms, 
though  more  so  for  stars  (£  =  0.0001  ;  for  polygons,  £  =  0.04). 
Form  and  data  set  show  an  interaction  effect,  with  stars 
having  the  highest  and  lowest  means  on  different  data  sets. 
The  interaction  of  form,  data  set,  and  background  is  also 
significant.  Data  sets  show  an  effect  at  both  the  second  and 
third  levels  of  background  in  simple  effects  tests  (£  = 
0.0004  and  £  =  0.0001,  respectively). 

The  background  shows  a  significant  effect  in  this  anal¬ 
ysis,  the  lowest  differences  at  the  simplest  level.  The 
level  of  background  is  significant  for  stars  and  for  non- 
shaded  figures  in  the  simple  effects  tests  at  those  levels 
of  the  variables  (£  =  0.01  for  both).  The  interaction  of 
background  and  the  other  visual  variables  and  data  sets  is 
also  significant.  From  the  means  it  can  be  seen  that,  for 
stars,  the  performances  with  the  two  simpler  background  lev- 
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TABLE  9 


Adjusted  Difference  Totals  from  Pair  Comparisons 


Additional  Information 


Level 

1 

Level 

2 

level 

( figure 

alone  ) 

(with 

radii  ) 

(with 

C i t v  Data 

Polygon 

St  a  r 

Polygon 

Star 

Polygon 

Shaded 

6 . 52 

6 . 95 

6 . 86 

6 .29 

5 . 38 

Non-shd . 

Car  Data 

A  .  86 

7 .62 

8 . 05 

8.67 

8.14 

Shaded 

6 .06 

4.69 

5.  19 

5 . 38 

6.81 

Non-shd . 

5  .  j3 

4.63 

6.00 

4.69 

5 . 94 

3 

grid) 

Star 


9.10 
9 . 05 


4  .  50 
5.81 
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TABLE  10  (Part  a) 


ANOVA  Results  from  Pair  Comparisons 


Source 

df 

MS 

F 

£ 

A  (add. info.) 

2 

0.144 

3 . 46 

0.036 

S  (shading) 

1 

0.104 

2 . 50 

0.118 

AS 

2 

0.061 

1.47 

0 .237 

Comb . /Groups 

84 

0 . 042 

F  (form) 

* 

1 

0.007 

0 . 33 

0 . 568 

D  ( data  ) 

1 

1 .290 

52.80 

0 .0001 

FP 

1 

C  .492 

25 . 92 

0 . 0001 

AF 

2 

0 . 026 

1 .27 

0.286 

AD 

2 

0 . 028 

1  .  14 

0 . 325 

SF 

1 

0.007 

0 . 35 

0.556 

SD 

1 

0.100 

4.07 

0.047 

ASF 

2 

0 .038 

1 .60 

0 .208 

ASD 

2 

0 . 059 

2.19 

0.118 

A  F  D 

2 

0 . 093 

4 . 92 

0.010 

SF  D 

1 

0.001 

0 . 06 

0 .805 

ASF  D 

2 

0.144 

7 . 56 

0.001 

F  x  Comb./Gr. 

84 

0.020 

D  x  Comb./Gr. 

84 

0  .  024 

F D  x  Comb . /Gr  . 

84 

0.019 

TABLE  10  (Part  b) 


Means  from  Pair  Comparisons 


Added  Information 


Shading 


Figure  alone  0.39 
With  radii  0.41 
With  Grid  0.46 


Non-shd.  0.44 

Shd .  0.40 


Form 


Data  Set 


Polygon  0.41 

Star  0.42 


Car  0.36 

City  0.48 


Added 


Inform . 
Figure  alone 


With  radii 


With  grid 


Shading 

Form 

Data 

Non  . 

Poly. 

Car 

0 . 36 

Non  . 

Poly. 

City 

0.33 

Non  . 

Star 

Car 

0.31 

Non  . 

Star 

City 

0.51 

Shd  . 

Poly  . 

Car 

0 .40 

Shd  . 

Poly. 

City 

0 .43 

Shd. 

Star 

Car 

0 .30 

Shd  . 

Star 

City 

0.46 

Non  . 

Poly. 

Car 

0  .  39 

Non  , 

Poly. 

City 

0 . 54 

Non  . 

Star 

Car 

0 . 30 

Non  . 

Star 

City 

0 . 56 

Shd  . 

Poly. 

Car 

0 . 34 

Shd  . 

Poly. 

City 

0 .43 

Shd  . 

Star 

Car 

0 . 36 

Shd  . 

Star 

City 

0 . 39 

Non  . 

Poly. 

Car 

0 .40 

Non  . 

Poly. 

City 

0 . 54 

Non  . 

Star 

Car 

0 . 39 

Non  . 

Star 

City 

0.60 

Shd  . 

Poly  . 

Car 

0 .45 

Shd  . 

Poly. 

City 

0  .  36 

Shd  . 

Star 

Car 

0 . 30 

Shd  . 

Star 

City 

0.61 
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TABLE  10  (Part  c) 


Means  from  Pair 


Added  Inf or  . 
Figure  alone 

With  radii 

With  grid 


Form 

Polygon 

Star 

Shading 
Non-shd  . 
Shaded 


Comparisons 


Data  Set  Form 


Car 

Polygon 

0 . 39 

Car 

Star 

0 . 30 

City 

Polygon 

0 . 38 

City 

Star 

0.48 

Car 

Polygon 

0 .37 

Car 

Star 

0 . 33 

City 

Polygon 

0 .48 

City 

Star 

0.47 

Car 

Polygon 

0.43 

Car 

Star 

0 . 34 

City 

Polygon 

0.45 

City 

Star 

0.60 

Data  Set 


Car 

City 

Car 

City 


0.39 
0.44 
0 . 33 
0 . 52 


Data  Set 


Car 

City 

Car 

City 


0 . 36 
0.51 
0 . 36 
0 . 44 


els  were  closer  to  each  other  than  to  that  of  the  grid  pattern 
background  (means  of  0.39,  0.40,  and  0.47  for  the  levels). 
The  fact  that  the  internal  radii  and  the  circle  through  the 
means  provide  a  center  for  polygons,  but  that  stars  are  vis¬ 
ually  centered  by  the  nature  of  their  form,  may  explain  this 
similarity  of  performance.  As  in  the  previous  analyses,  the 
grid  pattern  generally  shows  higher  difference  scores. 

At  the  intermediate  level  of  background  information 
shading  shows  lower  mean  differences  on  both  forms,  as  it 
does  in  the  analyses  of  the  comparison  with  the  clustering 
algoritms.  Here,  it  also  shows  lower  differences  at  the  most 
complex  level.  The  effect  of  shading  is  not  statistically 
significant  at  either  level,  though  (£  =  0.15  and  £  =  0.14, 
respectively).  While  showing  generally  lower  means,  shading 
does  not  show  a  significant  overall  effect. 

In  general,  the  analysis  of  consistency  between  subjects 
shows  again  the  variance  in  the  clusterings  which  were  done. 
The  mean  difference  scores  are  again  high.  Clustering  con¬ 
sistency  seems  to  be  again  related  to  the  data  set.  Use  of 
a  lower  level  of  added  background  information  and  of  shading 
may  be  indicated,  though  the  interaction  effects  make  gener¬ 
alization  difficult.  The  subjects*  ability  to  perceive  pat¬ 
terns  and  to  compare  one  to  others  and  the  vnriation  between 
subjects'  judgments  seem  to  override  most  of  the  effects  of 
the  variables. 
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CHAPTER  FOUR 


CONCLUSIONS 


The  results  of  this  investigation  have  several  impli¬ 
cations  for  polygon  displays  when  used  for  tasks  involving 
clustering  or  categorization.  Although  the  results  of  the 
analyses  are  somewhat  inconsistent,  there  are  a  number  of 
points  which  came  to  light  in  this  exploratory  experiment. 

In  designing  or  using  polygon  type  graphic  displays,  we 
should  keep  in  mind  that  judgments  of  similarity  may  be 
weighted  toward  distinctive,  similar  features  of  the  figures. 
Because  of  this  tendency,  decisions  of  membership  in  a  cate¬ 
gory  may  be  made  on  the  basis  of  a  subset  of  the  variables 
rather  than  the  overall  figure.  In  situations  where  the  task 
might  be  adverrely  affected,  such  as  in  status  displays  or 
identification  tasks,  the  display  should  be  so  designed  as 
to  minimize  this  effect.  If  the  possible  range  and  relation 
of  values  are  known,  *-he  variables  could  be  so  ordered  as  to 
lessen  the  appearance  of  distinctive  patterns,  which  might 
distract  from  the  overall  shape.  While  training  of  those 
using  the  display  may  be  able  to  compensate,  if  this  tendency 
is  based  in  a  perceptual  process,  it  may  not  be  amenable  or 
even  desirable  to  try  to  train  individuals  to  avoid  it. 

In  exploratory  analysis  this  tendency  may  prove  both 
helpful  and  detrimental.  It  will  allow  perceptions  of  pat¬ 
terns  in  the  data  which  other  methods  of  presentation  may 
overlook,  or  find  with  more  aifficulty.  On  the  other  hand, 
such  weighting  of  variables  may  hinder  the  observer  from 


perceiving  less  distinctive  but  important  patterns  in  the 
other  variables.  It  would  be  interesting  to  ascertain  the 
interplay  between  the  distinctiveness  of  the  graphic  pattern 
and  the  relationships  between  the  data  points  which  these 
patterns  reflect.  In  the  city  data  set  used  in  this  study, 
for  example,  it  was  the  distinctive  shape  of  the  polygons 
which  caused  the  grouping  of  four  cities  generally  separated 
by  the  standard  clustering  methods.  There  was  some  natural 
reason  for  these  cities  to  be  grouped.  The  graphic  presen¬ 
tation  allowed  one  to  observe  a  relationship  in  the  data 
which  did  exist  and  which  might  have  been  overlooked  using 
other  clustering  methods.  Further  study  could  indicate  with 
which  types  of  data  or  under  what  circumstances  graphic 
presentation  might  yield  insights  and  where  it  might  be  mis¬ 
leading  . 

When  using  polygons  for  cluster  identification,  the  data 
have  an  important  effect.  Although  the  form  variable  in  this 
experiment  did  not  show  an  overall  strong  effect,  there  was 
some  indication  that  the  polygon  form  was  less  affected  by 
differences  in  the  data  sets.  From  the  frequency  analysis 
it  was  noted  that  subjects  seem  to  group  objects  which  are 
members  of  distinct  clusters  with  fair  regularity;  the  groups 
that  overlap  are  less  consistently  clustered.  Ever,  if  one 
considers  that  these  are  not  actually  natural  clusters,  not 
coming  from  different  populations,  subjects  tended  to  group 
those  data  points  which  were  similar  in  the  distribution. 
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The  question  of  how  well  such  graphic  displays  work  for  ini¬ 
tial  determinations  of  the  numbers  of  clusters  needs  further 
investigation.  At  any  rate,  the  graphic  seems  to  work  best 
where  the  clusters  are  fairly  well  separated. 

The  variability  of  subjects  from  a  general  population 
sample  seems  to  have  been  a  problem  in  this  study.  To  all 
of  the  subjects  this  type  of  display  was  new.  While  they 
seemed  to  understand  the  task  and  the  explanation  of  how  the 
display  could  be  used,  none  had  actually  done  such  a  task 
before.  Many  of  the  other  studies  have  used  subjects  who 
have  probably  had  more  experience  in  data  analysis  than  did 
the  present  subjects  (e.g.,  F r en i -T i t u lae r  and  Louv,  1984; 
Wilkinson,  1982);  Woods,  Wise,  and  Hanes  (1982)  commented  on 
the  unfamiliarity  of  their  subjects  with  the  polygon  display 
for  safety  parameters  and  its  effect.  Some  exposure  to  the 
display  and  the  task  would  probably  bring  the  variability 
down.  Training  for  consistency  is  feasible,  involving  re¬ 
peated  clusterings  of  similar  data  sets,  with  feedback  on 
differences  from  earlier  groupings. 

For  display  design  there  is  some  indication,  at  least 
in  relation  to  the  consistency  with  clustering  algorithms 
results,  that  internal  radii  and  an  indication  of  means  are 
helpful  for  displays  using  the  usual  polygons,  and  that 
shading  may  be  beneficial.  That  the  added  information  should 
prove  helpful  has  an  intuitive  explanation  as  well.  The 
internal  radii  give  the  figure  a  center,  providing  a  basis 
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to  judge  relative  lengths  or  proportions  of  area  from  figure 
to  figure.  This  basis  may  help  to  explain  the  relatively 
good  performance  of  the  star  form,  especially  in  the  pair 
comparison.  Cleveland  and  McGill  (1984)  hypothesized  that 
judgments  of  length  are  higher  in  level  of  graphical  percep¬ 
tual  skill  than  of  area,  and  maybe  of  overall  pattern  as 
well. 

Finally  one  should  recognize  the  ability  of  subjects  to 
discern  the  pattern  of  the  figure.  More  study  is  needed  on 
the  intra-subject  consistency  in  the  use  of  polygon  displays, 
but  it  would  appear  that  to  a  certain  extent  the  perception 
of  patterns  overshadowed  the  effects  of  the  visual  variables 
in  this  experiment.  It  would  be  worthwhile  to  investigate 
whether  these  variables  affect  tasks  other  than  categori¬ 
zation,  such  as  information  integration. 

Polygons  displays  do  have  their  advantages.  They  are 
relatively  simple  to  produce,  are  abstract  in  form,  thus  un¬ 
burdened  by  subjective  perceptions,  and  seem  to  be  under¬ 
standable  to  a  wide  population. 
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Apoencii;; 


Instructions 


IMSTRUCTICNS 


The  experiment  in  whicn  you  are  participating  is  intended 
to  investigate  tne  effect  of  certain  visual  characteristics 
if  a  graphic  display  on  the  ability  if  people  to  use  that 
display.  It  is  hoped  that  by  identifying  the  optimal  char¬ 
acteristics  for  this  display,  designers  can  create  more  ef¬ 
ficient  and  effective  displays. 

This  type  of  graphic  display  is  used  to  portray  complex 
data  in  a  simplified  form.  Suppose,  for  example,  we  take  the 
percentage  of  Republican  votes  in  six  Presidential  elections 
£  o  r  nine  States.  Such  data  might  look  like  this: 
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We  c  n  examine  the  groups  to  see  the  similarity  between 
the  m  e  m  c  e  s  .  Notice  that  one.  group  of  States  snows  a  reia- 
-  v  e  i  y  consistent  voting  record  over  th»  years  in  this  exam¬ 
ple.  The  other  group  of  States  shows  a  distinct  change  in 
noting  pattern,  especially  in  the  1964  election;  those  are 
generally  deep  South  States. 

In  a  similar  manner  figures  can  be  drawn  f - o  m  the  meas¬ 
urements  3 f  various  attributes  for  any  set  of  objects.  Such 
graphic  analysis  can  be  useful  when  dealing  with  large 
amounts  of  data,  to  cluster  or  group  the  objects  on  tne  basis 
c  *  their  similarity.  -One  can  then  examine  a  smaller  number 
of  groups  more  easily,  than  the  numerical  data. 

I f  you  have  any  ejections  concerning  this  graphic  dis¬ 
play.  please  feel  free  to  ask  the  experimenter. 
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Procedure: 


You  will  be  presented  with  four  different  sets  of  fig¬ 
ures.  one  set  at  a  time.  The  data  for  some  of  these  sets 
consist  of  measurements  on  nine  different  attributes  of  var¬ 
ious  cities.  These  variables  include  levels  cf  pollutants 
(hydrocarbons,  oxides  of  nitrogen,  and  sulfur  dioxide),  mean 
temperature  for  January  and  July,  precipitation,  mean  level 
of  education,  population  density,  and  mortality  rate,  though 
not  in  this  order.  The  other  sets  consist  of  measurements 
on  nine  attributes  of  various  car  models,  including  mileage, 
trunk  space,  length,  repair  record,  rear  seat  room  price, 
weight,  and  displacement.  Two  of  the  sets  will  have  stars 
drawn  as  in  the  preceding  example;  the  figures  in  the  other 
sets  will  be  drawn  by  connecting  the  data  points  directly, 
farming  a  polygon.  As  with  the  stars,  the  data  is  repres¬ 
ented  by  the  distance  from  the  center.  Below  is  a  sample  of 
the  polygon  for  the  average  values  in  the  election  example. 


State  Number 

r"  inure  -S-diC  2  *.£  s  2  ,5 
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